Search CORE

166 research outputs found

Segmentation Based Multi-Cue Integration for Object Detection

Author: Leibe B
Mikolajczyk K
Schiele B
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2006
Field of study

TUbiblio

Crossref

Surrey Research Insight

MPG.PuRe

Психологическая подготовка спортсмена в контексте научных психологических исследований

Author: A. Ess
A. Geiger
B. Leibe
D. Gavrila
D. Geronimo
D. Nistér
K. Mikolajczyk
Publication venue
Publication date: 01/01/2011
Field of study

Electronic archive of Tomsk Polytechnic University

Crossref

Class Representative Visual Words for Category-Level Object Recognition

Author: B. Leibe
K. Mikolajczyk
P. Perronnin
S. Belongie
T. Tuytelaars
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Recent works in object recognition often use visual words, i.e. vector quantized local descriptors extracted from the images. In this paper we present a novel method to build such a codebook with class representative vectors. This method, coined Cluster Precision Maximization (CPM), is based on a new measure of the cluster precision and on an optimization procedure that leads any clustering algorithm towards class representative visual words. We compare our procedure with other measures of cluster precision and present the integration of a Reciprocal Nearest Neighbor (RNN) clustering algorithm in the CPM method. In the experiments, on a subset of the the Caltech101 database, we analyze several vocabularies obtained with different local descriptors and different clustering algorithms, and we show that the vocabularies obtained with the CPM process perform best in a category-level object recognition system using a Support Vector Machine (SVM). © 2009 Springer Berlin Heidelberg.López Sastre R.J., Tuytelaars T., Maldonado Bascón S., ''Class representative visual words for category-level object recognition'', Lecture notes in computer science, vol. 5524, 2009 (4th Iberian conference on pattern recognition and image analysis - IbPRAI 2009, June 10-12, 2009, Póvoa de Varzim, Portugal).status: publishe

Lirias

Crossref

Depth-From-Recognition: Inferring Meta-data by Cognitive Feedback

Author: Ferrari V.
Leibe B.
Thomas A.
Tuytelaars T.
Van Gool L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Crossref

Edinburgh Research Explorer

Towards Multi-View Object Class Detection

Author: Ferrari Vittorio
Leibe B.
Schiel B.
Thomas A.
Tuytelaars T.
Van Gool L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

International audienceWe present a novel system for generic object class detection. In contrast to most existing systems which focus on a single viewpoint or aspect, our approach can detect object instances from arbitrary viewpoints. This is achieved by combining the Implicit Shape Model for object class detection proposed by Leibe and Schiele with the multi-view specific object recognition system of Ferrari et al. After learning single-view codebooks, these are interconnected by so-called activation links, obtained through multi-view region tracks across different training views of individual object instances. During recognition, these integrated codebooks work together to determine the location and pose of the object. Experimental results demonstrate the viability of the approach and compare it to a bank of independent single-view detectors

Lirias

TUbiblio

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Edinburgh Research Explorer

MPG.PuRe

Hal-Diderot

HoughNet: Integrating Near and Long-Range Evidence for Bottom-Up Object Detection

Author: B Leibe
DH Ballard
E Akbas
E Gabriel
O Barinova
PF Felzenszwalb
PF Felzenszwalb
S Belongie
T-Y Lin
VJ Traver
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

© 2020, Springer Nature Switzerland AG.This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet’s best model achieves 46.4 AP (and 65.1 AP50), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of our proposal in another task, namely, “labels to photo” image generation by integrating the voting module of HoughNet to two different GAN models and showing that the accuracy is significantly improved in both cases. Code is available at https://github.com/nerminsamet/houghnet

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

Simultaneous Object Recognition and Segmentation by Image Exploration

Author: A. Selinger
B. Leibe
M.J. Swain
P.H.S. Torr
V. Ferrari
Z. Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Crossref

Edinburgh Research Explorer

Glimpse: Continuous, Real-Time Object Recognition on Mobile Devices

Author: Cao X.
Fan R.-E.
Ismail N.
Krizhevsky A.
Lange J. R.
Leibe B.
Lucas B. D.
Newton R.
Schmidhuber J.
Shi J.
Szegedy C.
Szegedy C.
Winstein K.
Zhuang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2015
Field of study

Glimpse is a continuous, real-time object recognition system for camera-equipped mobile devices. Glimpse captures full-motion video, locates objects of interest, recognizes and labels them, and tracks them from frame to frame for the user. Because the algorithms for object recognition entail significant computation, Glimpse runs them on server machines. When the latency between the server and mobile device is higher than a frame-time, this approach lowers object recognition accuracy. To regain accuracy, Glimpse uses an active cache of video frames on the mobile device. A subset of the frames in the active cache are used to track objects on the mobile, using (stale) hints about objects that arrive from the server from time to time. To reduce network bandwidth usage, Glimpse computes trigger frames to send to the server for recognizing and labeling. Experiments with Android smartphones and Google Glass over Verizon, AT&T, and a campus Wi-Fi network show that with hardware face detection support (available on many mobile devices), Glimpse achieves precision between 96.4% to 99.8% for continuous face recognition, which improves over a scheme performing hardware face detection and server-side recognition without Glimpse's techniques by between 1.8-2.5×. The improvement in precision for face recognition without hardware detection is between 1.6-5.5×. For road sign recognition, which does not have a hardware detector, Glimpse achieves precision between 75% and 80%; without Glimpse, continuous detection is non-functional (0.2%-1.9% precision)

DSpace@MIT

Crossref

Accurate Single Image Multi-Modal Camera Pose Estimation

Author: B. Leibe
C.P. Lu
D.F. DeMenthon
D.G. Lowe
G. Penney
G. Vosselman
H. Bay
K. Mikolajczyk
P. David
P. Viola
R. Raguram
S. Benhimane
V. Lepetit
Publication venue
Publication date: 01/01/2012
Field of study

Abstract. A well known problem in photogrammetry and computer vision is the precise and robust determination of camera poses with respect to a given 3D model. In this work we propose a novel multi-modal method for single image camera pose estimation with respect to 3D models with intensity information (e.g., LiDAR data with reflectance information). We utilize a direct point based rendering approach to generate synthetic 2D views from 3D datasets in order to bridge the dimensionality gap. The proposed method then establishes 2D/2D point and local region correspondences based on a novel self-similarity distance measure. Correct correspondences are robustly identified by searching for small regions with a similar geometric relationship of local self-similarities using a Generalized Hough Transform. After backprojection of the generated features into 3D a standard Perspective-n-Points problem is solved to yield an initial camera pose. The pose is then accurately refined using an intensity based 2D/3D registration approach. An evaluation on Vis/IR 2D and airborne and terrestrial 3D datasets shows that the proposed method is applicable to a wide range of different sensor types. In addition, the approach outperforms standard global multi-modal 2D/3D registration approaches based on Mutual Information with respect to robustness and speed. Potential applications are widespread and include for instance multispectral texturing of 3D models, SLAM applications, sensor data fusion and multi-spectral camera calibration and super-resolution applications

CiteSeerX

Crossref

Using Multi-view Recognition and Meta-data Annotation to Guide a Robot's Attention

Author: Alexander Thomas
Bastian Leibe
Bay H.
Belongie S.
Brostow G.
Cheng L.
Cornelis N.
Everingham M.
Everingham M.
Ferrari V.
Fraundorfer F.
Goedemé T.
Han F.
Hassner T.
Hoiem D.
Hoiem D.
Hoiem D.
Kushal A.
Leibe B.
Leibe B.
Leibe B.
Liebelt J.
Liu C.
Lowe D.G.
Luc Van Gool
Mumford D.
Munoz D.
Pantofaru C.
Posner I.
Russell B.
Savarese S.
Saxena A.
Saxena A.
Seemann E.
Segvic S.
Thomas A.
Thomas A.
Thomas A.
Tinne Tuytelaars
Vittorio Ferrari
Yan P.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2009
Field of study

In the transition from industrial to service robotics, robots will have to deal with increasingly unpredictable and variable environments. We present a system that is able to recognize objects of a certain class in an image and to identify their parts for potential interactions. The method can recognize objects from arbitrary viewpoints and generalizes to instances that have never been observed during training, even if they are partially occluded and appear against cluttered backgrounds. Our approach builds on the implicit shape model of Leibe et al. We extend it to couple recognition to the provision of meta-dat

Lirias

CiteSeerX

Crossref

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University